Search CORE

22 research outputs found

Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

Author: Chung Minhwa
Yang Seung Hee
Publication venue
Publication date: 20/04/2019
Field of study

Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning

Author: Choi Kwanghee
Chung Minhwa
Kim Sunhee
Yeo Eun Jung
Publication venue
Publication date: 27/10/2022
Field of study

Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level classification and an auxilary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted features such as eGeMaps and linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 4.79% for classification accuracy. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.09% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect

arXiv.org e-Print Archive

Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

Author: Choi Kwanghee
Chung Minhwa
Kim Sunhee
Yeo Eun Jung
Publication venue
Publication date: 28/05/2023
Field of study

This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Automatic Pronunciation Assessment of Korean Spoken by L2 Learners Using Best Feature Set Selection

Author: Chung Minhwa
Hong Hyejin
Kim Sunhee
Ryu Hyuksu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2016
Field of study

This paper proposes a method for automatic pronunciation assessment of Korean spoken by L2 learners by selecting the best feature set from a collection of the most well-known features in the literature. The L2 Korean Speech Corpus is used for assessment modeling, where the native languages of the L2 learners are English, Chinese, Japanese, Russian, and Mongolian. In our system, learners speech is forced-aligned and recognized using a native Korean acoustic model. Based on these results, various features for pronunciation assessment are computed, and divided into four categories such as RATE, SEGMENT, SILENCE, and GOP. Pronunciation scores produced by combining categories of features by multiple linear regression are used as a baseline. In order to enhance the baseline performance, relevant features are selected by using Principal Component Regression (PCR) and Best Subset Selection (BSS), respectively. The results show that the BSS model outperforms the baseline and the PCR model, and that features corresponding to speech segment and rate are selected as the relevant ones for automatic pronunciation assessment. The observed tendency of salient features will be useful for further improvement of automatic pronunciation assessment model for Korean language learners.OAIID:RECH_ACHV_DSTSH_NO:A201625650RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_09 (APSIPA 류혁수).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/9614f371-16ac-45af-add0-9434be5bacf0/linkCONFIRM:

SNU Open Repository and Archive

Analysis on Difference between Speaking Rates of Phoneme Classes and Oral Proficiency of Korean English Learners

Author: Chung Minhwa
Na Minsoo
Publication venue: 한국음성학회
Publication date: 21/05/2016
Field of study

OAIID:RECH_ACHV_DSTSH_NO:A201625644RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_02 (음성학회학술대회 나민수).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/6d783a10-6a7f-4a4e-9636-1c95c52cf78f/linkCONFIRM:

SNU Open Repository and Archive

조음 기반의 음소 레벨 사후 확률을 이용한 한국인 영어학습자 유창성 자동 평가

Author: Chung Minhwa
Ryu Hyuksu
Publication venue: 한국음성학회
Publication date: 20/05/2016
Field of study

OAIID:RECH_ACHV_DSTSH_NO:A201625645RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_03 (음성학회학술대회 류혁수).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/d29fd32e-14b6-4bef-b272-bad73234b9b8/linkCONFIRM:

SNU Open Repository and Archive

한국어 CAPT 시스템의 분절음 발음교육 우선순위: 중국어와 일본어권 학습자들의 변이양상을 중심으로

Author: Chung Minhwa
Yang Seung Hee
Publication venue: 한국음성학회
Publication date: 18/11/2016
Field of study

OAIID:RECH_ACHV_DSTSH_NO:A201625648RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_08 (음성학회 양승희).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/7b027b5d-224c-4b3a-bebd-ff4ccf72e266/linkCONFIRM:

SNU Open Repository and Archive

Assistive Program for Automatic Speech Transcription based on G2P conversion and Speech Recognition

Author: Chung Minhwa
Na Minsoo
Publication venue: 한국음성학회
Publication date: 21/05/2016
Field of study

OAIID:RECH_ACHV_DSTSH_NO:A201625646RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_04 (음성학회 나민수).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/2cf9d677-4d60-482f-8fd0-189d8afda57c/linkCONFIRM:

SNU Open Repository and Archive

Optimizing Vocabulary Modeling for Dysarthric Speech Recognition

Author: Chung Minhwa
Na Minsoo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2016
Field of study

Imperfection in articulation of dysarthric speech results in the deterioration on the performance of speech recognition. In this paper, the effect of the articulating class of phonemes in the dysarthric speech recognition results is analyzed using generalized linear mixed models (GLMMs). The model with the features categorized according to the manner of articulation and the place of tongue is selected as the best one by the analysis. Recognition accuracy score for each word is predicted based on its pronunciation and the GLMM. The vocabulary optimized by selecting words with the maximum score shows a 16.4 % relative error reduction in dysarthric speech recognition.OAIID:RECH_ACHV_DSTSH_NO:A201625647RECH_ACHV_FG:RR00200003ADJUST_YN:EMP_ID:A076305CITE_RATE:FILENAME:2016_05 (ICCHP 나민수).pdfDEPT_NM:언어학과EMAIL:[email protected]_YN:FILEURL:https://srnd.snu.ac.kr/eXrepEIR/fws/file/5d677af3-75c7-4475-ab97-f1ae6c45ea62/linkCONFIRM:

SNU Open Repository and Archive